AITopics

Country: North America (0.28)

Genre: Research Report (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Neural Information Processing SystemsDec-25-2025, 07:13:27 GMT

Data augmentation for efficient learning from parametric experts

We present a simple, yet powerful data-augmentation technique to enable data-efficient learning from parametric experts for reinforcement and imitation learning. We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert or expert policy to inform the behavior of a student policy. This setting arises naturally in a number of problems, for instance as variants of behavior cloning, or as a component of other algorithms such as DAGGER, policy distillation or KL-regularized RL. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories, thus dramatically reducing the environment interactions required for successful cloning of the expert. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems. We demonstrate the benefit of our method in the context of several existing and widely used algorithms that include policy cloning as a constituent part. Moreover, we highlight the benefits of our approach in two practically relevant settings (a) expert compression, i.e. transfer to a student with fewer parameters; and (b) transfer from privileged experts, i.e.

data augmentation, efficient learning, parametric expert, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Neural Information Processing SystemsDec-24-2025, 03:03:09 GMT

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

We study how to learn a policy with compositional generalizability. We propose a two-stage framework, which refactorizes a high-reward teacher policy into a generalizable student policy with strong inductive bias. Particularly, we implement an object-centric GNN-based student policy, whose input objects are learned from images through self-supervised learning. Empirically, we evaluate our approach on four difficult tasks that require compositional generalizability, and achieve superior performance compared to baselines.

compositional generalizability, refactoring policy, self-supervised object proposal, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Neural Information Processing SystemsDec-23-2025, 16:40:04 GMT

Offline Multi-Agent Reinforcement Learning with Knowledge Distillation

We introduce an offline multi-agent reinforcement learning ( offline MARL) framework that utilizes previously collected data without additional online data collection. Our method reformulates offline MARL as a sequence modeling problem and thus builds on top of the simplicity and scalability of the Transformer architecture. In the fashion of centralized training and decentralized execution, we propose to first train a teacher policy as if the MARL dataset is generated by a single agent. After the teacher policy has identified and recombined the good behavior in the dataset, we create separate student policies and distill not only the teacher policy's features but also its structural relations among different agents' features to student policies. Despite its simplicity, the proposed method outperforms state-of-the-art model-free offline MARL baselines while being more robust to demonstration's quality on several environments.

knowledge distillation, name change, offline multi-agent reinforcement learning, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.65)

arXiv.org Artificial IntelligenceDec-12-2025

Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input

Xu, Zifan, Seo, Myoungkyu, Lee, Dongmyeong, Fu, Hao, Hu, Jiaheng, Cui, Jiaxun, Jiang, Yuqian, Wang, Zhihan, Brund, Anastasiia, Biswas, Joydeep, Stone, Peter

Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

2512.06571

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Soccer Robots (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceDec-1-2025

VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation

He, Tairan, Wang, Zi, Xue, Haoru, Ben, Qingwei, Luo, Zhengyi, Xiao, Wenli, Yuan, Ye, Da, Xingye, Castañeda, Fernando, Sastry, Shankar, Liu, Changliu, Shi, Guanya, Fan, Linxi, Zhu, Yuke

A key barrier to the real-world deployment of humanoid robots is the lack of autonomous loco-manipulation skills. W e introduce VIRAL, a visual sim-to-real framework that learns humanoid loco-manipulation entirely in simulation and deploys it zero-shot to real hardware. VIRAL follows a teacher-student design: a privileged RL teacher, operating on full state, learns long-horizon loco-manipulation using a delta action space and reference state initialization. A vision-based student policy is then distilled from the teacher via large-scale simulation with tiled rendering, trained with a mixture of online DAgger and behavior cloning. W e find that compute scale is critical: scaling simulation to tens of GPUs (up to 64) makes both teacher and student training reliable, while low-compute regimes often fail. T o bridge the sim-to-real gap, VIRAL combines large-scale visual domain randomization over lighting, materials, camera parameters, image quality, and sensor delays--with real-to-sim alignment of the dexterous hands and cameras. Deployed on a Unitree G1 humanoid, the resulting RGB-based policy performs continuous loco-manipulation for up to 54 cycles, generalizing to diverse spatial and appearance variations without any real-world fine-tuning, and approaching expert-level teleoperation performance. Extensive ablations dissect the key design choices required to make RGB-based humanoid loco-manipulation work in practice.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2511.152

Genre: Research Report (0.82)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-18-2025

DiffuDepGrasp: Diffusion-based Depth Noise Modeling Empowers Sim2Real Robotic Grasping

Zhou, Yingting, Cui, Wenbo, Liu, Weiheng, Chen, Guixing, Li, Haoran, Zhao, Dongbin

Transferring the depth-based end-to-end policy trained in simulation to physical robots can yield an efficient and robust grasping policy, yet sensor artifacts in real depth maps like voids and noise establish a significant sim2real gap that critically impedes policy transfer. Training-time strategies like procedural noise injection or learned mappings suffer from data inefficiency due to unrealistic noise simulation, which is often ineffective for grasping tasks that require fine manipulation or dependency on paired datasets heavily. Furthermore, leveraging foundation models to reduce the sim2real gap via intermediate representations fails to mitigate the domain shift fully and adds computational overhead during deployment. This work confronts dual challenges of data inefficiency and deployment complexity. We propose DiffuDepGrasp, a deploy-efficient sim2real framework enabling zero-shot transfer through simulation-exclusive policy training. Its core innovation, the Diffusion Depth Generator, synthesizes geometrically pristine simulation depth with learned sensor-realistic noise via two synergistic modules. The first Diffusion Depth Module leverages temporal geometric priors to enable sample-efficient training of a conditional diffusion model that captures complex sensor noise distributions, while the second Noise Grafting Module preserves metric accuracy during perceptual artifact injection. With only raw depth inputs during deployment, DiffuDepGrasp eliminates computational overhead and achieves a 95.7% average success rate on 12-object grasping with zero-shot transfer and strong generalization to unseen objects.Project website: https://diffudepgrasp.github.io/.

large language model, machine learning, natural language, (16 more...)

2511.12912

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

arXiv.org Artificial IntelligenceOct-28-2025

DexSinGrasp: Learning a Unified Policy for Dexterous Object Singulation and Grasping in Densely Cluttered Environments

Xu, Lixin, Liu, Zixuan, Gui, Zhewei, Guo, Jingxiang, Jiang, Zeyu, Zhang, Tongzhou, Xu, Zhixuan, Gao, Chongkai, Shao, Lin

Abstract-- Grasping objects in cluttered environments remains a fundamental yet challenging problem in robotic manipulation. While prior works have explored learning-based synergies between pushing and grasping for two-fingered grippers, few have leveraged the high degrees of freedom (DoF) in dexterous hands to perform efficient singulation for grasping in cluttered settings. In this work, we introduce DexSinGrasp, a unified policy for dexterous object singulation and grasping. DexSinGrasp enables high-dexterity object singulation to facilitate grasping, significantly improving efficiency and effectiveness in cluttered environments. We incorporate clutter arrangement curriculum learning to enhance success rates and generalization across diverse clutter conditions, while policy distillation enables a deploy-able vision-based grasping strategy. T o evaluate our approach, we introduce a set of cluttered grasping tasks with varying object arrangements and occlusion levels. Experimental results show that our method outperforms baselines in both efficiency and grasping success rate, particularly in dense clutter . Dexterous grasping of target objects in cluttered environments is crucial for various applications, from production lines [1] to assembly processes [2], [3] and beyond.

artificial intelligence, clutter, machine learning, (15 more...)

2504.04516

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Agarwal, Shantnav, Alonso-Mora, Javier, Sun, Sihao

Decentralized Real-Time Planning for Multi-UAV Cooperative Manipulation via Imitation Learning

arXiv.org Artificial IntelligenceOct-21-2025

Abstract-- Existing approaches for transporting and manipulating cable-suspended loads using multiple UA Vs along reference trajectories typically rely on either centralized control architectures or reliable inter-agent communication. In this work, we propose a novel machine learning-based method for decentralized kinodynamic planning that operates effectively under partial observability and without inter-agent communication. Our method leverages imitation learning to train a decentralized student policy for each UA V by imitating a centralized kinodynamic motion planner with access to privileged global observations. The student policy generates smooth trajectories using physics-informed neural networks that respect the derivative relationships in motion. During training, the student policies utilize the full trajectory generated by the teacher policy, leading to improved sample efficiency. Moreover, each student policy can be trained in under two hours on a standard laptop. We validate our method in both simulation and real-world environments to follow an agile reference trajectory, demonstrating performance comparable to that of centralized approaches. Unmanned aerial vehicles (UA Vs) have gained significant traction across domains such as surveillance, agriculture, and infrastructure inspection due to their agility and versatility. However, their limited payload capacity restricts their effectiveness in applications involving the transportation of heavy or bulky objects which is common in construction and large-scale logistics. A scalable and cost-effective solution to this limitation is cable-suspended cooperative aerial manipulation [1], where multiple UA Vs cooperatively transport and control a cable-suspended payload. This method enables full pose manipulation of objects whose weight may exceed the capacity of a single UA V . Numerous control strategies have been proposed for cooperative transportation of suspended payloads using UA V teams. These approaches vary in terms of modeling accuracy, scalability, communication requirements, and capability to regulate the full pose of the payload. Given the focus of this work on decentralized cooperative aerial manipulation, prior methods are categorized into three primary frameworks: centralized control, decentralized control with communication, and decentralized control without communication. Figure 1: We enable decentralized cooperative aerial manipulation through student policies that operate independently using only the ego UA V's state and the pose of the load. These student policies are trained via imitation learning from a centralized teacher policy with privileged observations, including the full state of the other UA Vs and the load. The policy has been tested in real-world environments, where three UA Vs cooperatively manipulate a cable-suspended load.

artificial intelligence, machine learning, trajectory, (17 more...)

2510.17143

Genre: Research Report > New Finding (0.68)

Industry:

Law (0.54)
Information Technology (0.48)
Leisure & Entertainment (0.47)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.70)

arXiv.org Artificial IntelligenceOct-20-2025

From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

Li, Zhe, Chi, Cheng, Wei, Yangyang, Zhu, Boan, Peng, Yibo, Huang, Tao, Wang, Pengwei, Wang, Zhongyuan, Zhang, Shanghang, Xu, Chang

Natural language offers a natural interface for humanoid robots, but existing language-guided humanoid locomotion pipelines remain cumbersome and untrustworthy. They typically decode human motion, retarget it to robot morphology, and then track it with a physics-based controller. However, this multi-stage process is prone to cumulative errors, introduces high latency, and yields weak coupling between semantics and control. These limitations call for a more direct pathway from language to action, one that eliminates fragile intermediate stages. Therefore, we present RoboGhost, a retargeting-free framework that directly conditions humanoid policies on language-grounded motion latents. By bypassing explicit motion decoding and retargeting, RoboGhost enables a diffusion-based policy to denoise executable actions directly from noise, preserving semantic intent and supporting fast, reactive control. A hybrid causal transformer-diffusion motion generator further ensures long-horizon consistency while maintaining stability and diversity, yielding rich latent representations for precise humanoid behavior. Extensive experiments demonstrate that RoboGhost substantially reduces deployment latency, improves success rates and tracking precision, and produces smooth, semantically aligned locomotion on real humanoids. Beyond text, the framework naturally extends to other modalities such as images, audio, and music, providing a universal foundation for vision-language-action humanoid systems.

arxiv preprint arxiv, machine learning, natural language, (16 more...)

2510.14952

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.35)